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Abstract 

Simulations often involve the use of model parameters which are unknown 
or uncertain. For this reason, simulation experiments are often repeated 
for multiple combinations of parameter values, often iterating through 
parameter values lying on a fixed grid. However, the use of a discrete 
grid places limits on the dimension of the parameter space and creates 
the potential to miss important parameter combinations which fall in the 
gaps between grid points. Here we draw parallels with strategies for nu- 
merical integration and describe a Markov chain Monte-Carlo strategy for 
exploring parameter values. We illustrate the approach using examples 
from phylogenetics, archaeology, and epidemiology. 

1 Introduction 

Simulation experiments are a fundamental tool of many scientific fields. They 
are used to make model comparisons, to predict future events, carry out sen- 
sitivity analysis and to promote and test hypotheses or methodologies, indeed 
simulation experiments are almost obligatory in papers introducing new meth- 
ods or techniques. They can also be one of the most computationally intensive 
components of a scientific investigation, involving large scale calculations and a 
huge amount of replication. Simulation has also become an important tool for 
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inferring parameters from data (e.g. [18] ). though this is not the application 
that we will be considering here. 

In many fields, the models used in a simulation involve parameters which are 
either unknown or roughly estimated. One of the biggest practical challenges 
for those conducting simulations is deciding which values to use for these model 
parameters. The range of values needs to be broad enough that the experimental 
results have some level of generality, but still within the limits imposed by 
computational resources and time. A standard practice is to fix some values a 
priori and use a discrete grid to iterate over others. Multiple replicates are then 
carried out at each grid point (combination of parameter values). 

Grid-based strategies for exploring a function or space are widely used in 
numerical integration. They typically face the curse of dimensionality: as the 
number of dimensions grows the number of grid points increases exponentially 
[18] . This issue is, to an extent, avoided by Markov chain Monte-Carlo (MCMC) 
methods, where points are sampled randomly within the entire space according 
to an appropriately chosen Markov chain. 

Despite the impact that MCMC methods have had on a wide range of sci- 
entific fields (e.g. [SJ HI]), they do not appear to have been applied to the 
implementation of simulation experiments themselves. A simulation strategy 
which samples parameter stochastically using a Markov chain, rather than it- 
erating through a fixed grid, could be advantageous for several reasons: it by- 
passes, to an extent, the issues of dimensionality; there is less scope for critical 
areas of the parameter space to fall 'between the sample points'; and it may be 
possible to avoid some unnecessary simulations in 'uninteresting' parts of pa- 
rameter space by performing only one replicate per sample point. The MCMC 
algorithms samples more heavily from those area for which the outcome being 
studied has a higher probability, thereby mapping the surface more efficiently 
that, say, sampling from the prior alone. 

There are several obstacles to overcome before MCMC methods can be used 
to conduct simulation experiments in this manner. First, MCMC is a technique 
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for sampling from a distribution, and it is not immediately obvious what that 
distribution should be in this context. Second, MCMC and importance sampling 
algorithms typically require the value for a distribution to be known, at least up 
to a scalar constant. Given the model complexity inherent in many simulation 
experiments, exact or proportional values will not always be available. 

We address the first obstacle by describing a Bayesian-style framework for 
conducting simulation experiments in which the distribution of interest is the 
"posterior" distribution of the parameters conditional on a particular simulation 
outcome. We say "Bayesian-style" because we never work directly with actual 
data and the choice of prior is eventually factored out of the final output. We 
address the second obstacle using clever algorithms of [19] and pj] for conduct- 
ing MCMC and importance sampling in the absence of explicit likelihoods. Our 
methodology therefore overlaps with that used for approximate Bayesian com- 
puting, though we do not require the reduction of data to summary statistics. 

To illustrate our new framework, we compare two workhorses of phylogenetic 
inference, the UPGMA algorithm [25] and the Neighbor- Joining algorithm [23] . 
Both algorithms construct trees (or hierarchies) from distance data, the main 
difference between the two being that UPMGA assumes the mutations accu- 
mulate at a constant rate whereas NJ does not assume a constant rate, at the 
cost of a slightly higher sampling variance. We will use a simple simulation to 
identify situations where UPGMA performs at least as well as NJ, primarily to 
illustrate aspects of our MCMC framework. 

We then describe two additional case studies. The first re-examines an im- 
portant simulation experiment conducted by Allaby pQ. A key question of 
prehistoric agriculture is whether domesticated crops have single or multiple 
origins. Allaby et al pQ used simulations to demonstrate that standard phyloge- 
netic based methods could infer a single origin even given data from (simulated) 
crops with multiple origins. Aspects of Allaby et al.'s model have been contested 
[22] . Here we consider the choice of parameters values, and ask whether the re- 
sults of Allaby et al's study are robust to error in the model parameters. We 
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conclude that they are robust, and use the example to illustrate how our MCMC 
approach can be used to explore multidimensional parameter spaces. 

The final case study comes from epidemiology and uses a complex, and com- 
putationally intensive, simulator which models transmission of diseases through 
contact networks. In this study, we use population tract data for the city of 
Seattle. We consider two strategies for controlling the outbreak of a hypothet- 
ical disease: school closure and the provision of antiviral drugs. We examine 
how these strategies perform relative to each other, varying model parameters 
describing the rate of infection (Ro) and the fraction of the population /„ vac- 
cinated prior to the start of the epidemic. 

2 Methods 

2.1 Bayesian framework 

There are three principal ingredients of any simulation experiment: the simu- 
lator itself; the outcome under study; and the scheme used to select parameter 
values and replicate counts. The simulator takes a vector of parameter values 9 
and generates synthetic data X according to a distribution P (X\6) determined 
by the underlying model. Generally, the relationship between X and 9 is deter- 
mined by an algorithm and actual values of P(X\9) are unavailable. 

Example 

To compare NJ and UPGMA we simulate a "true tree" which is then used to 
evolve synthetic genetic data. The distribution of the tree is governed by two 
parameters: the "scale", s, which controls the height of the tree (and thereby the 
level of variability in the data) and the "skew", 7, which controls the variation 
in mutation rate in different parts of the tree. We used the sequence simulator 
implemented in the Python package PJ^ fWjj to evolve nucleotide sequences along 
this tree and then produced distance estimates from these. Details are in the 
appendix. □ 
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We assume that the goal of the simulation experiment is to study a partic- 
ular outcome, which corresponds to a particular event. For example, if we are 
using simulations to assess a given hypothesis test, the "outcome" of interest 
might be that the test gives a false positive and the goal might be to identify 
parameter values for which the probability of this outcome, a false positive, is 
large. We let 1Z denote the event that the outcome occurred, so P(TZ\9) is the 
probability of the outcome for a given set of parameter values 9. 

Example 

For the NJ-UPGMA comparison the outcome 7Z corresponds to the event that 
the tree produced by UPGMA is as close to the true tree as is the tree produced 
by NJ. We measure closeness using the Robinson-Foulds \2Vj distances between 
each of the UPGMA and NJ trees and the true tree. The outcome 1Z can be 
any event, and could be quite complicated. Below we consider examples where 
1Z corresponds to the result of a hypothesis test, or where 7Z corresponds to one 
strategy for mitigating an epidemic works better than another. □ 

The probability P(7Z\8) for a particular value of 9 is generally estimated by 
conducting multiple simulations with the same parameter values 9 and recording 
the proportion of times 1Z occurs. This is computationally intensive, and limits 
the range of different 9 values which can be considered. This is used in the 
standard 'grid search' approaches for simulations: 

Gl. For all combinations of parameter values 9 

G2. Carry out r simulations using parameters 9 (e.g. r = 100) 

G3. Estimate P(7Z\9) by the proportion of simulations for which 7Z occurs. 

Our approach is to turn the conventional methodology on its head and follow 
a Bayesian strategy. We ask: "if the simulation returns the specified outcome, 
what are the probable values of 9T\ in other words, "what is P(9\7Z)7" . For 
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this distribution to make sense, we need to specify a prior distribution P(9) on 
8. B ayes' rule then gives 



Note that the "likelihood" P(1Z\8) can (usually) not be directly evaluated. 

Samples from the posterior distribution P(8\7Z) indicate parameter values 
for which the outcome is more likely. This can be useful in itself, but it doesn't 
indicate whether the outcome is particularly likely in the absolute sense. For 
example, it may be that UPGMA tree is almost never as close to the true tree 
than the NJ tree and P(7Z\8) is small for all 8. However if the few times it is 
closer occur for the parameter values 8* then P(8* \1Z) will be large. 

In many contexts, it is more useful to infer the function P(1Z\8) as a function 
of 8. This 'likelihood' is also independent of the choice of prior. Below we show 
how to estimate this function using a combination of kernel density estimation 
and importance sampling. 

2.2 Sampling algorithm 

Our algorithm for sampling from P(8\1Z) is based on the "MCMC without like- 
lihoods" algorithm of [15] . Their algorithm takes data V and generates sam- 
ples from the posterior density P(8\T>). It does not use the likelihood function 
P(T>\8) directly but instead simulates values from P(-\8). Let it denote the prior 
on 8. 

Fl. If now at 8 propose a move to 8' according to a transition kernel q(8 — > 8'). 

F2. Generate T>' using model M. with parameters 8' . 

F3. If T> — V , go to F4, and otherwise stay at 8 and return to Fl. 

F4. Calculate 



P{9\K) = P(K\8) 



P(8) 



(1) 



p{ny 
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F5. Accept 8' with probability h; otherwise stay at 8; then return to Fl. 

Marjoram and colleagues proved that the chain produced has the required sta- 
tionary distribution, given an appropriate transition kernel q. 

To apply this algorithm within our context, we replace the event T> = V 
with a generic event 1Z (i.e, the result of a simulation experiment). We also 
switch the order of steps so that a simulation is carried out only if the move has 
not been rejected by the prior and transition kernel. 

51. If now at 8 propose a move to 9' according to a transition kernel q{8 — > 8'). 

52. Calculate 



53. With probability h, go to S4; otherwise stay at 8 and return to SI. 

54. Generate X with distribution P(X\8') using the simulator and assess 1Z 



S5. If 1Z holds, accept 8' and otherwise stay at 8, then return to SI. 

Note that if the transition kernel is symmetric and the prior distribution is 
uniform then S2. and S3, can be deleted. 

With mild conditions on g, the chain produced by the algorithm converges to 
the required stationary distribution P(8\1Z), even though knowledge of the like- 
lihood P(1Z\8) is never required. This property allows us to bypass the need for 
multiple replicates at each 9 value. It also provides a way of sampling through 
the space of 8 values without the need to resort to grid sampling. 

Example 

In figure^a) we give a density plot obtained from 100,000 iterations of the 
MCMC algorithm applied with a (bounded) uniform prior on 8\ and 8i- We see 
that if UPGMA performs at least as well as NJ, then it is likely that 0\ (the 
scale) is high and 02 (skew) is small. This makes sense: UPGMA is known to 




using X. 
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perform better when the substitution rate does not vary across the tree (i.e., the 
tree is clocklike). It is also a lower variance method than NJ so will perform 
better as the data becomes noisier. □ 






Figure 1: Comparison of UPGMA and NJ. a: Probability density function for the 
distribution of skew and scale parameters, given that UPGMA performed at least as 
well as NJ; b: Likelihood function showing the probability that UPGMA performs at 
least as well as NJ; c: frequency with which UPGMA performed better than NJ over 
100 simulations per parameter combination (i.e., a grid-search); d: plot c overlayed on 
plot b for ease of comparison. Darker colour indicates higher values. 



Samples from P(9\1Z) tell us which parameter values were likely, given that 
the simulation resulted in a specified outcome. For example, if a simulation 
was used to examine when a statistical test gave false positives, we could iden- 
tify which parameter values were most likely to lead to a false positive result. 
However, the samples do not tell us how likely the false positives are for those pa- 
rameter values. That is, samples from P(9\1Z) do not provide direct information 
about P(1Z\8). In the UPGMA-NJ simulation, we cannot tell from figure [ij a) 
whether UPGMA is a better method than NJ at any point. 
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From ([I]) we have that 



P{n\6) 



p{e\n)P{n) 



We assume that the prior P(6) is known and estimate P(9\1Z) using a sample 
produced by the MCMC algorithm. The missing ingredient is P(TV), the overall 
probability of the outcome when parameter values are sampled from the prior 
density. In general, the estimation of the normalising constant 



is a challenging statistical and computational problem, even when the likeli- 
hood function P(1Z\9) is available **Citation: German??**. Our approach, 
which follows and uses importance sampling, is a computationally efficient 
approximation. 

We assume that we have sampled a large number of approximately inde- 
pendent parameter vectors 9\, 9%, . . . , 9 n from P(9\1Z). When the dimension of 
the parameter space is not too large (say < 10), a good approximation of the 
posterior distribution can be obtained using kernel density estimates (KDE), 
see |24j . Let K(9) denote the kernel density estimate. 

We sample M new values 6»«, 6»( 2 \ . . . , 9 {M ^ from K. For each we simulate 
a data set X^ l > from P{X\9^) and compute importance weights 




(2) 




(3) 



The importance sampling estimate for P(1Z) is then 



1 



A I 



P(K) 



M 
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We note that repeating the entire estimation procedure for the complement of 
the outcome 1Z C , that is, the outcome that 1Z did not occur, we can estimate 
P(1Z C ). The difference between P(1Z) + P(1Z C ) and 1 gives a test of the quality 
of estimation. 

Given an estimate for P(1Z) we can substitute K(9) for P(9\1Z) in ^ and 
obtain an estimate for P(1Z\9). 

Example 

In the UPGMA-NJ example we calculated a P{7V) value of 0.116. We used this 
and the prior density to estimate the function P(1Z\9), Figure \J^b), which is es- 
sentially a rescaling of the posterior density plot Figure^a). As a comparison, 
Figure [IJ^c ) is an estimate of the same function carried out using a grid of 9 
values and performing 100 replicates of the simulation per grid point. □ 

2.3 Case study I: Assessing the robustness a simulation 
experiment 

The origin of agriculture was one of the defining moments of human history, 
and yet there is still debate over the nature and timing of the domestication 
process [TJ [5J [HI [501 122- Whereas archaeobotanical evidence suggests a 
protracted and complex period of domestication, at least of Fertile Crescent 
cereals, phylogenetic evidence has suggested a single domestication origin. Al- 
laby and colleagues used a series of simulation experiments to demonstrate the 
phylogenetic techniques used could be misleading, and that, under a particular 
model of plant domestication, the admixture of two populations that emerged 
in independent domestication events can appear monophyletic [U [2] (see also 
[22]). 

The model of domestication and admixture used in the simulations of [1] is 
governed by seven parameters controlling recombination, population sizes, and 
bottleneck durations. Four of the seven parameter values were fixed in advance 
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using estimates from the literature, while the remaining three were set to a 
limited number of different values. Their experiment therefore demonstrates 
that the phylogenetic method is misleading for a particular choice of parame- 
ter values. We used our simulation framework to investigate how robust their 
conclusion was to variation in the simulation parameter values. 

The model used for these simulations is described in detail in [5] (see also 



Figure SI). Briefly, frequencies for twenty chromosomes were established for 
each of two "wild- type" populations of size n w . In order to simulate a bot- 
tleneck, rit, individuals were then drawn (with replacement) from each of the 
wild-type populations, and this small population size was maintained for tf, gen- 
erations. In each new generation, when a new genome was created, each pair 
of chromosomes underwent a recombination event with probability p r . Each 
population was then allowed to expand to rid individuals over the course of a 
single generation, and this new domesticated population size was maintained 
for td generations, then the two domesticated populations were merged and the 
simulation continued for another t/, generations with this admixed population 
(of size rid). At the end of the simulation, a NJ tree was constructed from a 
Dice |10j distance matrix estimated from two individuals sampled from each 
of the wild-type populations, the domesticated populations, and the admixed 
population (a total of ten individuals). If the inferred tree contained a split 
separating the individuals from the admixed population from the others, they 
were considered to appear (erroneously) monophyletic. 

We performed simulations using the seven parameters described above 9 = 
(n w ,nb,tb,rid,td,th,Pr) and an "outcome" event 1Z corresponding to the event 
that the phylogenetic method falsely inferred monophyly. Prior densities for all 
parameters were uniform on intervals centred on the values used in [T] . We used 
the MCMC to sample parameter values 9 from P(9\1Z), and then used the KDE 
and importance sampling techniques to estimate P(1Z\9) as a function of 9. See 
the appendix for precise details. 

See the appendix for precise details about the simulation experiment. 
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2.4 Case study II: Comparing competing strategies for 
epidemics 

Our second case study compares two different strategies for mitigating epi- 
demics. We use the FluTE software [7J, which incorporates detailed demo- 
graphic data to model contact networks and the spread of an epidemic through 
a population. In this instance, the population modelled is the city of Seattle. 

We considered two responses to the outbreak of an epidemic: closing schools 
and administering antiviral drugs. Using the simulator we can simulate the 
progression of the epidemic when the first strategy is implemented and then 
repeat the experiment with the same random seed but with the second strategy. 

At first, we let 1Z be the outcome that the total number of symptomatic 
individuals is smaller when the schools were closed than when antiviral drugs 
were administered. We consider two model parameters, the basic reproduction 
number of the virus (Ro) and the fraction of the population that was vaccinated 
prior to the start of the epidemic (/„). The goal was to see which strategy 
worked best over different values for these parameters. 

We also consider another measure of success, the peak (rather than total) 
number of symptomatic individuals. To this end we repeated the analysis, 
this time with 7Z denoting the event that the peak number of symptomatic 
individuals was smaller when schools were closed than when antiviral drugs 
were administered. 

See the appendix for precise details about the simulation experiment. 

3 Results 

3.1 Case study I: Assessing the sensitivity of a simulation 
experiment 

Using the MCMC algorithm, kernel density estimation and the importance sam- 
pling method we obtained an estimate of the likelihood function P(1Z\8), which 
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gives the probability of obtaining a (false) monophyly result for every choice of 
the six parameters. Figure [2] gives the average likelihood values for every choice 
of single parameter and every pair of parameters. 



a. m 




n, 



Figure 2: Likelihood that a hybrid population erroneously appears to have a mono- 
phyletic origin. Likelihood functions showing the probability of inferring a mono- 
phyletic origin over single and pairs of parameters, rib', size of the bottleneck popu- 
lations; tb' duration of the bottleneck (in generations); n^: size of the domesticated 
populations (following post-bottleneck expansion as well as following hybridization); 
td'- duration of domestication (prior to hybridization, in generations); th- time after 
hybridization (in generations); p r : probability of a recombination event; n m : wild-type 
population size. See text for a description of how these parameters were used. 

From the figure we see that variation in only two out of the six variables 
have a significant impact on the probability of monophyly: rid, the size of the 
domesticated population and t^, the amount of time elapsing following admix- 
ture of the two domesticate populations. The probability of the test returning 
a false positive increases as the population size decreases or when the number 
of generations increases. Both trends are consistent with theory, as both de- 
creasing the population size and increasing the number of generations increases 
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the chance that lineages from one of the domesticates have become fixed in the 
admixed population. A plot of probability of monophyly versus genetic diversity 



(Supplementary Figure S2) suggests that, for these simulations, the diversity is 
a good predictor of a false positive test. 

Figure [3] gives a more detailed representation of the probability of monophyly 
as a function of domesticated population size and bottleneck duration. Here we 
have fixed the other parameters at the same values used in the simulation of 
PQ, and marked the various parameter values they used for the domesticated 
population size and the number of generations since admixture (n^ — 20, % = 
10, i<j = 20,n IU = 10000). We fixed p r , the recombination probability, at 0.1, 
one of four values for this parameter used by Allaby and colleagues. 




H generations following admixture 



Figure 3: Likelihood that a hybrid population appears monophyletic: Domesticated 
population size and post-admixture time. Likelihood function showing the probability 
of inferring a monophyletic origin estimated from the seven-parameter KDE by setting 
the remaining five parameters to the values used by Allaby and colleagues [I], as 
described in the text. 



We see immediately that over the range of population size values considered, 
the probability of a false positive is generally high (>0.5). Note, however, the 
strong dependency on the size of the domesticated population size. Ross-Ibarra 
and colleagues suggest that the small effective population sizes used in [T] are 
partially responsible for the results described [22]. In response, Allaby and 
colleagues stated that the bottleneck parameters used in their model reflect 
established dimensions from biological populations [2J. 
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3.2 Case study 2: Comparing competing strategies for 
epidemics 

We applied our simulation framework to influenza epidemic simulation in order 
to assess the relative effectiveness of a non-pharmaceutical (school closure) and 
pharmaceutical (antiviral drugs) mitigation strategy. Simulations were set up 
using two different criteria to assess the success of a mitigation strategy over 
another: in one case, a strategy was considered successful if the peak number 
of symptomatic individuals was at most equal to the number when the other 
strategy was used; in the other, a strategy was successful if the total number 
of symptomatic individuals was at most equal to the number when the other 
strategy was used. 

Figure [4ji and b show the distribution of the Ro and vaccinated fraction pa- 
rameters sampled when success was based on the peak number of symptomatic 
individuals, when closure of schools or administration of antiviral drugs per- 
formed best, respectively. School closure (Figure [4^,) tended to reduce the peak 
number of cases relative to antivirals when either much of the population was 
vaccinated before the onset of the epidemic and the virus' Ro was low, or when 
little of the population was vaccinated and the Ro was high. The region corre- 
sponding to low to moderate Ro that was rarely sampled when school closure 
was the more effective strategy. Antiviral drugs (Figure [4Jd) also reduced the 
peak number of symptomatic individuals relative to closure of schools when the 
population was highly vaccinated and the Ro was low, but also when values 
for both parameters were small, moderate, or large. This corresponds almost 
exactly to the region that was rarely sampled when school closure was more 
effective than administration of antiviral medication. 

Results when the acceptance criterion was a relative reduction in the total 
number of symptomatic individuals (Figure |4j; and d) differed greatly. In this 
case, school closure (Figure |4ji) tended to be more effective than antiviral drugs 
across a broad range of Ro only when the population was highly vaccinated, but 
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Figure 4: Comparison of influenza epidemic response measures. Likelihood 
functions for simulated epidemics with two parameters, the basic reproductive 
number of the virus (Ro), and the vaccinated fraction of the population (/„). a: 
Probability that school closure reduces the peak number of cases to a greater ex- 
tent than antiviral medication; b: Probability that antiviral medication reduces 
the peak number of cases to a greater extent than school closure; c: Probability 
that school closure reduces the cummulative number of cases to a greater extent 
than antiviral medication; d: Probability that antiviral medication reduces the 
cumulative number of cases to a greater extent than school closure. 

only for low R when the population was less highly vaccinated. In contrast, 
antiviral drugs (Figure were more effective than closure of schools across a 
broad range of Ro when the population was largely unvaccinated, and only for 
higher R values when the population was highly vaccinated. 

Our FluTE simulations that used antiviral drugs as a response to the ascer- 
tainment of an influenza infection assumed that drugs were dispensed to both 
symptomatic individuals as well as their family members. FluTE models the 
effect of antiviral medications on an individual by decreasing three probabili- 
ties: the probability that a symptomatic individual will transmit the infection; 
the probability that an infected individual will become syptomatic in the first 
place; and the probability that an uninfected individual will become infected. 
Therefore, simulations using the antiviral strategy should show a reduction in 
the number of infections both within families of an infected individual and be- 
tween members of an "infected family" and others. When the population is 
considered as a contact network, with individuals as vertices and probability of 
infection as edge weights, it is clear that antiviral drugs decrease the weights 
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of edges incident to infected individuals and their family members (who would 
have an increased risk of infection, due to increased contact probability with 
the infected individual). 

On the other hand, school closure temporarily eliminates from the contact 
network those edges corresponding to within-school contacts, while at the same 
time increasing the weight of edges corresponding to daytime neighborhood 
contacts [7] . Vertices corresponding to infected students will therefore generally 
have fewer incident edges, and if these edges largely correspond to vaccinated 
individuals, or if the probability of infection is generally low (i.e., if R is small), 
then many of these students may not infect others, and the total number of 
infections may be reduced. However, if it is very likely that an infected student 
will nonetheless infect at least one other person even when schools are closed 
(e.g., a neighbour or family member who has workplace contacts), and the 
infection can subsequently be transmitted to non-school environments with the 
same probability whether or not schools have been closed. The removal of 
within-school edges would therefore be expected to reduce the rate at which the 
epidemic will spread (and therefore the peak number), but not the total number 
of infected individuals. These results are consistent with other studies, in which 
school closure has been shown to reduce the peak number of influenza infections 
to a greater extent than the total [3 H7] . They are also consistent with the 
observation that school closure is less effective with higher Ro epidemics [T5] , 

4 Discussion 

Markov chain Monte Carlo algorithms have made a huge impact on numerical 
integration and statistical inference [18] . Here we have presented a framework 
for conducting simulation experiments which uses MCMC to consider different 
combinations of parameter values. Our framework provides an alternative to 
a conventional 'grid-based' strategy, where the simulation experiment iterates 
through all combinations of a fixed set of parameter values. 
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The advantages (and disadvantages) of using a sampling based strategy to 
select parameter values are similar to the advantages (and disadvantages) of 
using Monte Carlo methods for numerical integration. Our approach is less 
vulnerable to the 'curse' of dimensionality and, importantly, does not rely on a 
priori selection of fixed grid values, values which could completely miss regions 
of interest. 

There are two main stages in the approach we describe. First, we use an 
MCMC algorithm to sample from a 'posterior' distribution of the parameters 
conditional on a certain simulation outcome. For example, when comparing 
different strategies for dealing with epidemics we conditioned on the closing 
schools working better than administering antivirals, and then sampled from the 
distribution of two parameters: the reproduction number and the proportion of 
the population vaccinated. The resulting density indicates for which areas of 
the parameter space the school closure strategy is most likely to work better 
than administering antivirals. 

The second stage takes the output of the MCMC and uses importance sam- 
pling to estimate the 'likelihood' surface giving the probability of different out- 
comes for each parameter value. For example, in the epidemiological example 
the likelihood surface describes the probability that one strategy works bet- 
ter than another, whereas the output of the MCMC only indicates where one 
strategy was likely to work best. 

While we have demonstrated that the framework we introduce is both feasi- 
ble and practical, it is clear that considerable advances can be made improving 
the computational and statistical efficiency. The MCMC without likelihoods al- 
gorithm which we use is the basis of approximate Bayesian computation (ABC), 
and there has been a great deal of work improving the efficiency of ABC sam- 
pling algorithms ^ 4, 8j, most of which could be applied here. Our application 
of density estimation could likewise be improved and refined. Indeed, it is con- 
ceivable that there may be some way to use the simulation results from the 
MCMC step to sidestep the need for importance sampling, though we could not 
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see how to do this without introducing a substantial bias. 

We present the results from three case studies: a comparison of phylogenetic 
methods; a test of robustness for a simulation experiment in genetic archaeology; 
and an exploration of the effectiveness of two mitigation strategies for epidemics. 
In all three cases, our approach explored the parameter space without having to 
specify a fixed set of values. We obtained detailed estimates of the effect different 
parameter combinations had on the probability of the outcomes, estimates which 
could be used for further studies or to test hypotheses. 

Finally, we stress that the MCMC framework which we have introduced is 
extremely straightforward to implement. The basic sampler requires no more 
code than a grid based strategy, and the estimation of the likelihood function 
uses standard took[^] The framework is completely general and, while the appli- 
cations explored here are all biological in nature, there is nothing to prevent the 
framework being applied in any context that simulation experiments are carried 
out. 
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A Simulation details 



A.l Comparison of distance-based phylogenetic inference 
methods 

We compared two distance-based phylogenetic inference methods, UPGMA [25] 
and NJ [25] . Trees on thirty taxa were simulated under a Yule model using the 
Python package DendroPy version 3.9 [25]. A pair of parameters, 9, describ- 
ing the tree's non-clocklikeness ("skew"; 7) and expected root-to-tip distance 
("scale"; s) were then used to adjust the edge lengths, as follows. For each edge 
in the tree, a uniform random number u between —7 and 7 was chosen, and the 
edge's length was multiplied by exp u. The edges of the tree were then scaled 
so that the height of the tree was equal to s. 

Nucleotide sequences of length 1000 were then simulated along this tree 
using the Python package P4 [13] under the Jukes-Cantor model [T5]- Pairwise 
distances between sequences were calculated using the Jukes-Cantor model, and 
trees were inferred using the NJ and UPGMA methods. The Robinson-Foulds 
[2T] distance between each tree and the true tree (i.e., the tree along which 
sequences were simulated) was calculated. 

We used this simulation strategy in the MCMC framework described above 
for 100,000 steps, saving every hundredth sample. Both parameters were drawn 
from uniform proposal distributions; 7 was uniform on (0, log 10), s was uni- 
formly distributed on (0.02,1). We then fitted a KDE to these 1,000 samples 
and used the importance sampling method described above to estimate the 
probability P (7Z) that UPGMA performs at least as well as NJ. One thousand 
additional simulations were performed in the application of the importance sam- 
pling method. 

We also used this simulation strategy in a grid-search approach by dividing 
each parameter in 20 intervals and performing simulations with the midpoints 
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for all pairs of intervals (i.e., for all pairs of (7, s), where 

{13 39 1 

-log 10, -log 10,..., -log 10] 

and s e {0.0245, 0.0735, . . . , 0.9555}). For each pair (7, s), 100 simulations were 
performed, and the number of times UPGMA performed at least as well as NJ 
was counted. 



A. 2 Investigation of erroneously inferred monophyly in 
domesticated populations 

Populations were simulated as described in lj, implemented in custom software 
available from http://leigh.net.nz/software.shtml DO THIS. A schematic of 
the model used in these simulations, described by seven parameters, is shown 
in Supplementary Figure |S1| Two sequences were sampled from each of the 
wildtype populations, from the two domesticated populations, and from the 
hybrid population. Dice distances [10] were calculated between all pairs of 
sequences, and a tree was inferred using NJ [23 . If a bipartition was found 
in the inferred tree in which the two sequences from the hybrid population 
appeared on one side of the split to the exclusion of all other sequences, the 
population was considered monophyletic. 

The probability distribution of the seven parameters, given that the hybrid 
population appeared monophyletic, was sampled using the MCMC method de- 
scribed here for 1,000,000 steps, sampling every 200 steps. All parameters were 
drawn from uniform proposal distributions with ranges given in Supplementary 



Table S2 We then fitted a KDE to these 5,000 samples in order to apply the 
importance sampling method described here to estimate the probability P (TV) 
of erroneously inferring a monophyletic origin for the hybrid population. For 
importance sampling, an additional 10,000 simulations were performed. 
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A. 3 Simulation of responses to an influenza epidemic 

Mitigation strategies for influenza epidemics were explored using the FluTE 
epidemic simulator version 1.13. [7J. We used the population tract data for 
the city of Seattle, which is distributed with FluTE, for our simulations. Two 
variable parameters were explored in these simulations: the basic reproduction 
number of the virus (Ro) and the fraction of the population that was vaccinated 
prior to the start of the epidemic (/»). Other parameters used for simulations 
are summarized in Supplementary Table |S1| Vaccinated individuals were dis- 
tributed uniformly among age groups. For each combination of parameters, two 
simulations were run with the same seed (for pseudo-random number genera- 
tion) . When the number of symptomatic individuals reached the ascertainment 



threshold (0.8%, see Supplementary Table SI I, one of two possible mitigation 
strategies was initiated: either all schools were closed for a period of 14 days, 
or infected individuals and their household members were given antiviral drugs. 
The performance of the mitigation strategies was assessed based on either the 
peak number of symptomatic individuals, or the total number of symptomatic 
individuals. 

We used the MCMC framework described to explore the performance of the 
two mitigation strategies over the range of the two parameters (Ro and f v ) by 
sampling from the parameter space when school closure performed better than 
antiviral drugs, using each of the performance criteria (i.e., when either the peak 
or total number of symptomatic individuals was reduced relative to simulations 
in which antiviral drugs were administered). For comparison, we also sampled 
from the parameter space where antiviral medication was a more effective mit- 
igation strategy using the relative reduction in both peak and total number of 
symptomatic individuals as a performance criterion. In each case, chains were 
run for 100,000 steps, sampling every 20 steps. Proposal distributions for both 
parameters were uniform. Ro was bounded by [1.2,3] and f v was bounded by 
[0,0.7]. We then used the importance sampling method described here to esti- 
mate the probability that each mitigation strategy outperforms the other under 
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each of the two performance criteria, using 1,000 simulations in each case. 



B Figure Legends 

Figure [TJ Comparison of UPGMA and NJ 

a: Probability density function for the distribution of skew and scale parameters, 
given that UPGMA performed at least as well as NJ; b: Likelihood function 
showing the probability that UPGMA performs at least as well as NJ; c: fre- 
quency with which UPGMA performed better than NJ over 100 simulations per 
parameter combination (i.e., a grid-search); d: plot c overlayed on plot b for 
ease of comparison. Darker colour indicates higher values. 

Figure [2} Probability of erroneously inferring a monophyletic 
origin for a hybrid population 

Likelihood functions showing the probability of inferring a monophyletic origin 
over single and pairs of parameters, rib'- size of the bottleneck populations; 
<{,: duration of the bottleneck (in generations); rid', size of the domesticated 
populations (following post-bottleneck expansion as well as following hybridiza- 
tion); td'. duration of domestication (prior to hybridization, in generations); th- 
time after hybridization (in generations); p r : probability of a recombination 
event; n w : wild-type population size. See text for a description of how these 
parameters were used. 

Figure [3f Probability of erroneously inferring monophyly: 
domesticated population size versus time following admix- 
ture 

. Likelihood function showing the probability of inferring a monophyletic origin 
estimated from the seven-parameter KDE by setting the remaining five param- 
eters to the values used by Allaby and colleagues pQ, as described in the text. 
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Figure [4} Comparison of influenza epidemic response mea- 
sures 

Likelihood functions for simulated epidemics with two parameters, the basic 
reproductive number of the virus (Ro), and the vaccinated fraction of the pop- 
ulation (f v ). a: Probability that school closure reduces the peak number of 
cases to a greater extent than antiviral medication; b: Probability that antivi- 
ral medication reduces the peak number of cases to a greater extent than school 
closure; c: Probability that school closure reduces the cummulative number of 
cases to a greater extent than antiviral medication; d: Probability that antivi- 
ral medication reduces the cumulative number of cases to a greater extent than 
school closure. 



Supplementary Figure SI Model from [2J used for plant 



domestication simulation experiments 

Simulations were performed by sampling bottleneck populations from each of 
two separate wildtype populations. Following the bottleneck period, each bottle- 
neck population expanded in size to form a domesticated population. Following 
the period of domestication these populations were then merged to form the hy- 
brid population. The simulation continued for the duration of the hybridization 
period. 



Supplementary Figure \S2\ Relationship between erroneous 
inference of monophyly and genetic diversity 

The probability of inferring monophyly was plotted versus the genetic diversity 
of a simulated hybridized population for 1,000 sets of parameters randomly 
selected from within the range of the prior. Probabilities were estimated from 
the likelihood function shown in Figures [2] and [3j 
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Supplementary Figures 



Figure SI: Model from [5] used for plant domestication simulation experiments. 
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Figure S2: Probability that the hybrid population appears monophyletic as a 
function of genetic diversity 
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Supplementary Tables 



Table SI: Parameters used in all FluTE simulations^ 

a. Pharmaceutical efficacy parameters 

Pharmaceutical Susceptibility Infectiousness Illness, given infection 

Vaccine 0.4 0.4 0.67 

Anti-viral 0.3 0.62 0.6 



b. Other parameters 
Parameter Value 

Infected (initial) 10 

Infected (seeded daily) 1 

Simulation duration 180 days 

Ascertainment fraction 0.8 

Ascertainment delay 1 day 

Response threshold 0.01 

Response delay 7 days 

Pre-epidemic strategy vaccination (uniform) 

t For parameters not listed, default values were used. 



Table S2: Bounds on parameters used in simulations based on the model of 
Allaby et al [2] 

Parameter Minimum Maximum 



n w 1000 20000 

n b 10 39 

t b 5 25 

n d 40 400 

t d 10 50 

t h 25 400 

Pr 1 
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